{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ur8xi4C7S06n"
      },
      "outputs": [],
      "source": [
        "# Copyright 2025 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JAPoU8Sm5E6e"
      },
      "source": [
        "# Build and deploy a Hugging Face smolagent using DeepSeek-r1 on Vertex AI\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fopen-models%2Fuse-cases%2Fvertex_ai_deepseek_smolagents.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\">\n",
        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "84f0f73a0f76"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "| Author(s) |  [Ivan Nardini](https://github.com/inardini) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tvgnzT1CKxrO"
      },
      "source": [
        "## Overview\n",
        "\n",
        "> [DeepSeek-R1 from DeepSeek](https://huggingface.co/deepseek-ai/DeepSeek-R1) is a powerful language model developed with a focus on enhancing reasoning capabilities. DeepSeek-R1-Zero, DeepSeek-R1, and a collection of six distilled, dense models derived from DeepSeek-R1. These distilled models, based on the popular Llama and Qwen architectures, offer a range of sizes and capabilities to suit diverse research needs.\n",
        "\n",
        "> [HuggingFace's smol-agents](https://huggingface.co/docs/smolagents/en/index) library provides a lightweight and flexible framework for building and experimenting with language agents.\n",
        "\n",
        "> [Vertex AI](https://cloud.google.com/vertex-ai/docs) provides a comprehensive platform for the entire machine learning lifecycle.  It empowers you to build, train, and deploy ML models and AI applications, including customizing powerful large language models (LLMs).\n",
        "\n",
        "This notebook showcases how to deploy DeepSeek R1 Distill Qwen 7B from the Hugging Face Hub on Vertex AI using Vertex AI Model Garden. It also shows how to prototype and deploy a simple agent using HuggingFace's smol-agents library on Vertex AI Reasoning Engine.\n",
        "\n",
        "\n",
        "By the end of this notebook, you will learn how to:\n",
        "\n",
        "- Register and deploy Deepseek-r1 from the Hugging Face Hub on Vertex AI\n",
        "- Prototype and evaluate an Deepseek-r1 agent on Vertex AI Reasoning Engine\n",
        "- Prototype and deploy an Deepseek-r1 agent on Vertex AI Reasoning Engine\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "61RBz8LLbxCR"
      },
      "source": [
        "## Get started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "No17Cw5hgx12"
      },
      "source": [
        "### Install Vertex AI SDK and other required packages\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "tFy3H3aPgx12"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --user --quiet \"google-cloud-aiplatform[reasoningengine, evaluation]\" \"openai\" \"smolagents\" \\\n",
        "    \"cloudpickle==3.0.0\" \\\n",
        "    \"pydantic>=2.10\" \\\n",
        "    \"requests\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R5Xep4W9lq-Z"
      },
      "source": [
        "### Restart runtime\n",
        "\n",
        "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
        "\n",
        "The restart might take a minute or longer. After it's restarted, continue to the next step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XRvKdaPDTznN"
      },
      "outputs": [],
      "source": [
        "import IPython\n",
        "\n",
        "app = IPython.Application.instance()\n",
        "app.kernel.do_shutdown(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SbmM4z7FOBpM"
      },
      "source": [
        "<div class=\"alert alert-block alert-warning\">\n",
        "<b>⚠️ The kernel is going to restart. In Colab or Colab Enterprise, you might see an error message that says \"Your session crashed for an unknown reason.\" This is expected. Wait until it's finished before continuing to the next step. ⚠️</b>\n",
        "</div>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dmWOrTJ3gx13"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "NyKGtVQjgx13"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WZeNZkeq3wS6"
      },
      "source": [
        "### Authenticate your Hugging Face account\n",
        "\n",
        "Then you can install the `huggingface_hub` that comes with a CLI that will be used for the authentication with the token generated in advance. So that then the token can be safely retrieved via `huggingface_hub.get_token`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zB-DuCZj30hb"
      },
      "outputs": [],
      "source": [
        "from huggingface_hub import interpreter_login\n",
        "\n",
        "interpreter_login()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jTYJgjRN34RH"
      },
      "source": [
        "Read more about [Hugging Face Security](https://huggingface.co/docs/hub/en/security), specifically about [Hugging Face User Access Tokens](https://huggingface.co/docs/hub/en/security-tokens).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DF4l8DTdWgPY"
      },
      "source": [
        "### Set Google Cloud project information and initialize Vertex AI SDK\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Nqwi-5ufWp_B"
      },
      "outputs": [],
      "source": [
        "# Use the environment variable if the user doesn't provide Project ID.\n",
        "import os\n",
        "\n",
        "import vertexai\n",
        "\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "BUCKET_NAME = \"[your-bucket-name]\"  # @param {type: \"string\", placeholder: \"[your-bucket-name]\", isTemplate: true}\n",
        "\n",
        "if not BUCKET_NAME or BUCKET_NAME == \"[your-bucket-name]\":\n",
        "    BUCKET_NAME = f\"{PROJECT_ID}-bucket\"\n",
        "\n",
        "BUCKET_URI = f\"gs://{BUCKET_NAME}\"\n",
        "\n",
        "! gsutil mb -p $PROJECT_ID -l $LOCATION $BUCKET_URI\n",
        "\n",
        "vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5303c05f7aa6"
      },
      "source": [
        "## Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6fc324893334"
      },
      "outputs": [],
      "source": [
        "import random\n",
        "import string\n",
        "import threading\n",
        "import time\n",
        "\n",
        "from IPython.display import HTML, Markdown, display\n",
        "import google.auth\n",
        "from google.auth import default\n",
        "import google.auth.transport.requests\n",
        "from google.cloud import aiplatform\n",
        "from huggingface_hub import get_token\n",
        "import openai\n",
        "import pandas as pd\n",
        "import plotly.graph_objects as go\n",
        "from smolagents import ChatMessage, CodeAgent, Model\n",
        "from smolagents.agents import ActionStep\n",
        "from smolagents.tools import Tool\n",
        "from vertexai.preview import reasoning_engines\n",
        "from vertexai.preview.evaluation import EvalTask"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H_PZhSWSqrA6"
      },
      "source": [
        "## Helpers"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "R0uBN4Kpqtmv"
      },
      "outputs": [],
      "source": [
        "def get_id(length: int = 8) -> str:\n",
        "    \"\"\"Generate a uuid of a specified length (default=8).\"\"\"\n",
        "    return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n",
        "\n",
        "\n",
        "def parse_smolagents_output_to_dictionary(agent, agent_outcome):\n",
        "    \"\"\"\n",
        "    Parse SmolAgent output into a structured dictionary format.\n",
        "    \"\"\"\n",
        "\n",
        "    final_output = {\"response\": str(agent_outcome), \"predicted_trajectory\": []}\n",
        "\n",
        "    try:\n",
        "        # Access the agent's action logs\n",
        "        for log in agent.logs:\n",
        "            # First check if the log is an ActionStep\n",
        "            if isinstance(log, ActionStep):\n",
        "                # Then check if it has tool_calls\n",
        "                if hasattr(log, \"tool_calls\"):\n",
        "                    for tool_call in log.tool_calls:\n",
        "                        # Parse tool arguments - split by newline and create key-value pairs\n",
        "                        args_list = [\n",
        "                            arg.strip()\n",
        "                            for arg in tool_call.arguments.split(\"\\n\")\n",
        "                            if arg.strip()\n",
        "                        ]\n",
        "                        tool_args = {\n",
        "                            f\"arg_{idx}\": arg\n",
        "                            for idx, arg in enumerate(\n",
        "                                args_list\n",
        "                            )  # Using enumerate ensures sequential numbering\n",
        "                        }\n",
        "\n",
        "                        # Create tool info dictionary\n",
        "                        tool_info = {\n",
        "                            \"tool_name\": tool_call.name,\n",
        "                            \"tool_input\": tool_args,\n",
        "                        }\n",
        "                        final_output[\"predicted_trajectory\"].append(tool_info)\n",
        "\n",
        "    except Exception as e:\n",
        "        final_output[\"error\"] = f\"Error parsing tools results: {str(e)}\"\n",
        "\n",
        "    return final_output\n",
        "\n",
        "\n",
        "def format_output_as_markdown(output: dict) -> str:\n",
        "    \"\"\"\n",
        "    Convert the output dictionary to a detailed execution report.\n",
        "\n",
        "    Args:\n",
        "        output: Dictionary containing response and predicted trajectory\n",
        "\n",
        "    Returns:\n",
        "        str: Formatted string with detailed execution information\n",
        "    \"\"\"\n",
        "    report = \"📊 Execution Report\\n\"\n",
        "    report += \"=\" * 50 + \"\\n\\n\"\n",
        "\n",
        "    report += \"🎯 Final Result:\\n\"\n",
        "    report += f\"{output['response']}\\n\\n\"\n",
        "\n",
        "    if output[\"predicted_trajectory\"]:\n",
        "        report += \"🔍 Execution Details:\\n\"\n",
        "        report += \"-\" * 50\n",
        "        for idx, call in enumerate(output[\"predicted_trajectory\"], 1):\n",
        "            report += f\"\\n📌 Operation {idx}:\\n\"\n",
        "            report += f\"Tool: {call['tool_name']}\\n\"\n",
        "            report += \"Args:\\n\"\n",
        "            for arg_name, command in call[\"tool_input\"].items():\n",
        "                report += f\"  ▶ {command}\\n\"\n",
        "            report += \"-\" * 50 + \"\\n\"\n",
        "\n",
        "    return report\n",
        "\n",
        "\n",
        "def display_dataframe_rows(\n",
        "    df: pd.DataFrame,\n",
        "    columns: list[str] | None = None,\n",
        "    num_rows: int = 3,\n",
        "    display_drilldown: bool = False,\n",
        ") -> None:\n",
        "    \"\"\"Displays a subset of rows from a DataFrame, optionally including a drill-down view.\"\"\"\n",
        "\n",
        "    if columns:\n",
        "        df = df[columns]\n",
        "\n",
        "    base_style = \"font-family: monospace; font-size: 14px; white-space: pre-wrap; width: auto; overflow-x: auto;\"\n",
        "    header_style = base_style + \"font-weight: bold;\"\n",
        "\n",
        "    for _, row in df.head(num_rows).iterrows():\n",
        "        for column in df.columns:\n",
        "            display(\n",
        "                HTML(\n",
        "                    f\"<span style='{header_style}'>{column.replace('_', ' ').title()}: </span>\"\n",
        "                )\n",
        "            )\n",
        "            display(HTML(f\"<span style='{base_style}'>{row[column]}</span><br>\"))\n",
        "\n",
        "        display(HTML(\"<hr>\"))\n",
        "\n",
        "        if (\n",
        "            display_drilldown\n",
        "            and \"predicted_trajectory\" in df.columns\n",
        "            and \"reference_trajectory\" in df.columns\n",
        "        ):\n",
        "            display_drilldown(row)\n",
        "\n",
        "\n",
        "def display_eval_report(eval_result: pd.DataFrame) -> None:\n",
        "    \"\"\"Display the evaluation results.\"\"\"\n",
        "    metrics_df = pd.DataFrame.from_dict(eval_result.summary_metrics, orient=\"index\").T\n",
        "    display(Markdown(\"### Summary Metrics\"))\n",
        "    display(metrics_df)\n",
        "\n",
        "    display(Markdown(\"### Row-wise Metrics\"))\n",
        "    display(eval_result.metrics_table)\n",
        "\n",
        "\n",
        "def plot_bar_plot(\n",
        "    eval_result: pd.DataFrame, title: str, metrics: list[str] = None\n",
        ") -> None:\n",
        "    fig = go.Figure()\n",
        "    data = []\n",
        "\n",
        "    summary_metrics = eval_result.summary_metrics\n",
        "    if metrics:\n",
        "        summary_metrics = {\n",
        "            k: summary_metrics[k]\n",
        "            for k, v in summary_metrics.items()\n",
        "            if any(selected_metric in k for selected_metric in metrics)\n",
        "        }\n",
        "\n",
        "    data.append(\n",
        "        go.Bar(\n",
        "            x=list(summary_metrics.keys()),\n",
        "            y=list(summary_metrics.values()),\n",
        "            name=title,\n",
        "        )\n",
        "    )\n",
        "\n",
        "    fig = go.Figure(data=data)\n",
        "\n",
        "    # Change the bar mode\n",
        "    fig.update_layout(barmode=\"group\")\n",
        "    fig.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e43229f3ad4f"
      },
      "source": [
        "## Set model\n",
        "\n",
        "Set the model ID from Hugging Face Hub. In this case, you use DeepSeek-R1-Distill-Qwen-7B, a dense model distilled from DeepSeek-R1 good at math."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cf93d5f0ce00"
      },
      "outputs": [],
      "source": [
        "MODEL_ID = \"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B\"  # @param {type:\"string\", isTemplate: true}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "H0Ze75ouQrdZ"
      },
      "source": [
        "## Register and Deploy DeepSeek model on Vertex AI\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dJsGNCv4T6R-"
      },
      "source": [
        "### Register a DeepSeek model on Vertex AI Model Registry\n",
        "\n",
        "Deploying a DeepSeek model on Vertex AI begins with importing the model into the [Vertex AI Model Registry](https://www.google.com/search?q=model+registry+vertex+ai&oq=model+registry+vertex+ai&gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyCggBEAAYgAQYogQyBggCEEUYPDIGCAMQRRg8MgYIBBBFGDwyBggFEEUYQDIGCAYQRRhAMgYIBxBFGEDSAQg2MzMxajBqN6gCALACAA&sourceid=chrome&ie=UTF-8), a central hub for managing your ML model lifecycle.  This registry stores model configurations, enabling streamlined organization, tracking, and versioning.  \n",
        "\n",
        "The `aiplatform.Model.upload` method specifies the display name, the serving container image URI (pointing to the vLLM inference container on Vertex AI Model Garden), and arguments for the vLLM API server. Key arguments include the model name, tensor parallelism size, maximum model length, and enforcement of eager execution.\n",
        "\n",
        "It also defines the serving container port, predict route, health route, and crucial environment variables, notably the Hugging Face token for downloading the model from the Hugging Face Hub.\n",
        "\n",
        "See the [vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#quickstart-online) and [aiplatform.Model.upload](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_upload) Python reference for a complete list of arguments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Pc-Owc_7WiZb"
      },
      "outputs": [],
      "source": [
        "deepseek_model = aiplatform.Model.upload(\n",
        "    display_name=MODEL_ID.replace(\"/\", \"--\").lower(),\n",
        "    serving_container_image_uri=\"us-docker.pkg.dev/deeplearning-platform-release/vertex-model-garden/vllm-inference.cu121.0-6.ubuntu2204.py310\",\n",
        "    serving_container_args=[\n",
        "        \"python\",\n",
        "        \"-m\",\n",
        "        \"vllm.entrypoints.api_server\",\n",
        "        \"--host=0.0.0.0\",\n",
        "        \"--port=8080\",\n",
        "        f\"--model={MODEL_ID}\",\n",
        "        # Hugging Face configuration\n",
        "        \"--tensor-parallel-size=1\",\n",
        "        \"--max-model-len=16384\",\n",
        "        \"--enforce-eager\",\n",
        "    ],\n",
        "    serving_container_ports=[8080],\n",
        "    serving_container_predict_route=\"/generate\",\n",
        "    serving_container_health_route=\"/ping\",\n",
        "    serving_container_environment_variables={\n",
        "        \"HF_TOKEN\": get_token(),\n",
        "        \"DEPLOY_SOURCE\": \"notebook\",\n",
        "    },\n",
        ")\n",
        "deepseek_model.wait()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c427c0a87016"
      },
      "source": [
        "### Deploy DeepSeek model on Vertex AI Prediction\n",
        "\n",
        "After the model is registered on Vertex AI, you can deploy the model to an endpoint.\n",
        "\n",
        "First create the endpoint with aiplatform.Endpoint.create method. Then you deploys the model to this endpoint, specifying the machine type (`g2-standard-24`), accelerator type (`NVIDIA_L4`), and the number of accelerators (`2`).\n",
        "\n",
        "> This deployment configuration is based on [Vertex AI Model Garden](https://console.cloud.google.com/vertex-ai/model-garden/featured-partners/hugging-face). Be sure you have enough GPU quota for deploying the model.\n",
        "\n",
        "For more information on the supported `aiplatform.Model.deploy` arguments, you can check its [Python reference](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_deploy)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "sxI3SsHxVYn4"
      },
      "outputs": [],
      "source": [
        "deepseek_endpoint = aiplatform.Endpoint.create(\n",
        "    display_name=MODEL_ID.replace(\"/\", \"--\").lower() + \"-endpoint\"\n",
        ")\n",
        "\n",
        "deployed_deepseek_model = deepseek_model.deploy(\n",
        "    endpoint=deepseek_endpoint,\n",
        "    machine_type=\"g2-standard-12\",\n",
        "    accelerator_type=\"NVIDIA_L4\",\n",
        "    accelerator_count=1,\n",
        "    sync=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UEY206Oj0U2G"
      },
      "source": [
        "> Note that the model deployment on Vertex AI can take around 20 minutes to get deployed.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8lIr8uko04T_"
      },
      "source": [
        "### Generate predictions with Vertex AI API\n",
        "\n",
        "After deploying the model, you can use the `aiplatform.Endpoint.predict` method to generate online predictions. This sends requests to the deployed endpoint, utilizing the `/predict` route defined within the container and adhering to Vertex AI's input/output payload formatting requirements.\n",
        "\n",
        "> Note the instance request format is aligned the [vLLM OpenAI Completions API interface](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-completions-api-with-vllm)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "6lu1C-nXSvH-"
      },
      "outputs": [],
      "source": [
        "prediction_request = {\n",
        "    \"instances\": [\n",
        "        {\n",
        "            \"@requestFormat\": \"chatCompletions\",\n",
        "            \"messages\": [\n",
        "                {\n",
        "                    \"role\": \"user\",\n",
        "                    \"content\": \"Count the number of 'r' in the word Strawberry\",\n",
        "                }\n",
        "            ],\n",
        "            \"max_tokens\": 2048,\n",
        "            \"temperature\": 0.7,\n",
        "        }\n",
        "    ]\n",
        "}\n",
        "\n",
        "output = deployed_deepseek_model.predict(instances=prediction_request[\"instances\"])\n",
        "for prediction in output.predictions[0]:\n",
        "    print(\"------- DeepSeek prediction -------\")\n",
        "    print(prediction[\"message\"][\"content\"])\n",
        "    print(\"---------------------------------\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iljyx7y5Nbl1"
      },
      "source": [
        "## Build a simple math agent with Hugging Face's smolagents\n",
        "\n",
        "With your DeepSeek model now deployed on Vertex AI, let's leverage its mathematical capabilities. The `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` excels at mathematical reasoning, making it an ideal tool for an agent designed to verify math results.  \n",
        "\n",
        "Let's create a simple agent that combines the strengths of Gemini's function calling for orchestration and answer generation with DeepSeek's verification abilities on Vertex AI. This agent will use Hugging Face's smol-agents library."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jZ5l0fTr3wAu"
      },
      "source": [
        "### Create a VertexAIServerModel class\n",
        "\n",
        "To integrate Gemini with Vertex AI for agent development, a custom [Model](https://huggingface.co/docs/smolagents/v1.5.0/en/reference/agents#models) class is required. This class will represent the Gemini text generation model, serving as the engine for your agent."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T_yzMnYZ4tz8"
      },
      "source": [
        "> Note the code is based on the official [Model](https://github.com/huggingface/smolagents/blob/main/src/smolagents/models.py) implementation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "kvClfddqLxcR"
      },
      "outputs": [],
      "source": [
        "class VertexAIServerModel(Model):\n",
        "    \"\"\"This model connects to a Vertex AI-compatible API server.\"\"\"\n",
        "\n",
        "    def __init__(\n",
        "        self, model_id: str, project_id: str, location: str, endpoint_id: str, **kwargs\n",
        "    ):\n",
        "        #  Try to import dependencies\n",
        "        try:\n",
        "            from google.auth import default\n",
        "        except ModuleNotFoundError:\n",
        "            raise ModuleNotFoundError(\n",
        "                \"Please install 'openai, google-auth and requests' extra to use VertexAIGeminiModel as described in the official documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/call-vertex-using-openai-library\"\n",
        "            ) from None\n",
        "\n",
        "        # Initialize parent class with any additional keyword arguments\n",
        "        super().__init__(**kwargs)\n",
        "        self.model_id = model_id\n",
        "        self.project_id = project_id\n",
        "        self.location = location\n",
        "        self.endpoint_id = endpoint_id\n",
        "        self.kwargs = kwargs\n",
        "        self._refresh_task = None\n",
        "\n",
        "        # Initialize credentials and set up Google Cloud authentication with required permissions\n",
        "        self.credentials, _ = default(\n",
        "            scopes=[\"https://www.googleapis.com/auth/cloud-platform\"]\n",
        "        )\n",
        "        self._refresh_token()\n",
        "        self._setup_client()\n",
        "        self._start_refresh_loop()\n",
        "\n",
        "    def __call__(\n",
        "        self,\n",
        "        messages: list[dict[str, str]],\n",
        "        **kwargs,\n",
        "    ) -> ChatMessage:\n",
        "\n",
        "        # Prepare the API call parameters\n",
        "        completion_kwargs = self._prepare_completion_kwargs(\n",
        "            messages=messages,\n",
        "            model=self.model_id,\n",
        "            **self.kwargs,\n",
        "        )\n",
        "\n",
        "        # Make the API call to Vertex AI\n",
        "        response = self.client.chat.completions.create(**completion_kwargs)\n",
        "        self.last_input_token_count = response.usage.prompt_tokens\n",
        "        self.last_output_token_count = response.usage.completion_tokens\n",
        "\n",
        "        # Convert API response to ChatMessage format\n",
        "        message = ChatMessage.from_dict(\n",
        "            response.choices[0].message.model_dump(\n",
        "                include={\"role\", \"content\", \"tool_calls\"}\n",
        "            )\n",
        "        )\n",
        "        return message\n",
        "\n",
        "    def _refresh_token(self):\n",
        "        \"\"\"Refresh the Google Cloud token\"\"\"\n",
        "        try:\n",
        "            self.credentials.refresh(google.auth.transport.requests.Request())\n",
        "            self._setup_client()\n",
        "        except Exception as e:\n",
        "            print(f\"Token refresh failed: {e}\")\n",
        "\n",
        "    def _setup_client(self):\n",
        "        \"\"\"Setup OpenAI client with current credentials\"\"\"\n",
        "        self.client = openai.OpenAI(\n",
        "            base_url=f\"https://{self.location}-aiplatform.googleapis.com/v1beta1/projects/{self.project_id}/locations/{self.location}/endpoints/{self.endpoint_id}\",\n",
        "            api_key=self.credentials.token,\n",
        "        )\n",
        "\n",
        "    def _start_refresh_loop(self):\n",
        "        \"\"\"Start the token refresh loop\"\"\"\n",
        "\n",
        "        def refresh_loop():\n",
        "            while True:\n",
        "                time.sleep(3600)\n",
        "                self._refresh_token()\n",
        "\n",
        "        self._refresh_thread = threading.Thread(target=refresh_loop, daemon=True)\n",
        "        self._refresh_thread.start()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9HX1BrGz30_i"
      },
      "source": [
        "### Create a math tool using a DeepSeek model\n",
        "\n",
        "In the context of language agents, a tool is a self-contained function the agent can utilize.  For a language model to effectively use a tool, the tool must have a well-defined API, including a name, a concise description, specifications for input types and their descriptions, and a defined output type.  \n",
        "\n",
        "To integrate our deployed DeepSeek model on Vertex AI as a tool within a smol-agents framework, a custom [Tool](https://huggingface.co/docs/smolagents/en/guided_tour#tools) class is required. This class will represent the DeepSeek model, serving as the mean to take action for your agent. In this case, the tool would verify math results.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "T6wciAad4MAB"
      },
      "outputs": [],
      "source": [
        "class DeepSeekMathVerifierTool(Tool):\n",
        "    \"\"\"A tool that verifies math responses\"\"\"\n",
        "\n",
        "    name = \"math_verifier\"\n",
        "    description = \"\"\"This is a tool that verifies math responses\"\"\"\n",
        "    inputs = {\n",
        "        \"content\": {\n",
        "            \"type\": \"string\",\n",
        "            \"description\": \"a text containing math\",\n",
        "        }\n",
        "    }\n",
        "    output_type = \"string\"\n",
        "\n",
        "    def __init__(self, project_id: str, location: str, endpoint_id: str, **kwargs):\n",
        "        try:\n",
        "            from google.cloud import aiplatform\n",
        "            import vertexai\n",
        "        except ModuleNotFoundError:\n",
        "            raise ModuleNotFoundError(\n",
        "                \"Please install 'vertexai' and 'google-cloud-aiplatform' extra to use DeepSeekMathVerifierTool\"\n",
        "            ) from None\n",
        "\n",
        "        super().__init__()\n",
        "        self.endpoint_id = endpoint_id\n",
        "        self.project_id = project_id\n",
        "        self.location = location\n",
        "        self.kwargs = kwargs\n",
        "        self._refresh_task = None\n",
        "\n",
        "        # Initialize credentials and set up Google Cloud authentication with required permissions\n",
        "        self.credentials, _ = default(\n",
        "            scopes=[\"https://www.googleapis.com/auth/cloud-platform\"]\n",
        "        )\n",
        "        self._refresh_token()\n",
        "        self._start_refresh_loop()\n",
        "\n",
        "        # Initialize Vertex ai session and the endpoint\n",
        "        vertexai.init(\n",
        "            project=self.project_id,\n",
        "            location=self.location,\n",
        "            credentials=self.credentials,\n",
        "            **self.kwargs,\n",
        "        )\n",
        "        self.endpoint = aiplatform.Endpoint(\n",
        "            endpoint_name=f\"projects/{self.project_id}/locations/{self.location}/endpoints/{self.endpoint_id}\"\n",
        "        )\n",
        "\n",
        "    def forward(self, content: str):\n",
        "        \"\"\"Submit the prediction request\"\"\"\n",
        "        content = str(content)\n",
        "        prediction_request = {\n",
        "            \"instances\": [\n",
        "                {\n",
        "                    \"@requestFormat\": \"chatCompletions\",\n",
        "                    \"messages\": [{\"role\": \"user\", \"content\": content}],\n",
        "                }\n",
        "            ]\n",
        "        }\n",
        "\n",
        "        try:\n",
        "            output = self.endpoint.predict(instances=prediction_request[\"instances\"])\n",
        "        except Exception as e:\n",
        "            print(f\"Prediction failed: {e}\")\n",
        "            return None\n",
        "        prediction = output.predictions[0][0][\"message\"][\"content\"]\n",
        "        return prediction\n",
        "\n",
        "    def _refresh_token(self):\n",
        "        \"\"\"Refresh the Google Cloud token\"\"\"\n",
        "        try:\n",
        "            self.credentials.refresh(google.auth.transport.requests.Request())\n",
        "        except Exception as e:\n",
        "            print(f\"Token refresh failed: {e}\")\n",
        "\n",
        "    def _start_refresh_loop(self):\n",
        "        \"\"\"Start the token refresh loop\"\"\"\n",
        "\n",
        "        def refresh_loop():\n",
        "            while True:\n",
        "                time.sleep(3600)\n",
        "                self._refresh_token()\n",
        "\n",
        "        self._refresh_thread = threading.Thread(target=refresh_loop, daemon=True)\n",
        "        self._refresh_thread.start()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gDoVJbEo920U"
      },
      "source": [
        "### Assemble the agent\n",
        "\n",
        "Having defined both the model and the tool, we can now assemble a basic agent.  \n",
        "\n",
        "`smolagents` provides a default implementation called `CodeAgent`, which is designed to write and execute Python code at each step of its process.  \n",
        "\n",
        "For more detailed information on agent construction and capabilities, refer to the `smolagents` [Agent](https://huggingface.co/docs/smolagents/en/guided_tour#codeagent-and-toolcallingagent) documentation.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mqZRJTL96OIa"
      },
      "outputs": [],
      "source": [
        "endpoint_id = next(\n",
        "    (\n",
        "        endpoint.name\n",
        "        for endpoint in aiplatform.Endpoint.list()\n",
        "        if endpoint.display_name == MODEL_ID.replace(\"/\", \"--\").lower() + \"-endpoint\"\n",
        "    ),\n",
        "    None,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OO4-Smd3C63D"
      },
      "outputs": [],
      "source": [
        "model = VertexAIServerModel(\n",
        "    model_id=\"google/gemini-2.0-flash\",\n",
        "    endpoint_id=\"openapi\",\n",
        "    project_id=PROJECT_ID,\n",
        "    location=LOCATION,\n",
        ")\n",
        "\n",
        "tools = [\n",
        "    DeepSeekMathVerifierTool(\n",
        "        endpoint_id=endpoint_id, project_id=PROJECT_ID, location=LOCATION\n",
        "    )\n",
        "]\n",
        "\n",
        "agent = CodeAgent(model=model, tools=tools, add_base_tools=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "CKBraQd5GefX"
      },
      "source": [
        "### Test the agent\n",
        "\n",
        "After you assemble the agent, you are now able to test it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cITl7egiTWl-"
      },
      "outputs": [],
      "source": [
        "response = agent.run(\"Hello! How are you?\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "x7MlhZZUyxgy"
      },
      "outputs": [],
      "source": [
        "print(format_output_as_markdown(parse_smolagents_output_to_dictionary(agent, response)))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WtNKWuWwDE4Q"
      },
      "outputs": [],
      "source": [
        "response = agent.run(\n",
        "    \"Count the number of 'r' in the word Strawberry. Verify the answer\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "aWly89543ZR8"
      },
      "outputs": [],
      "source": [
        "print(format_output_as_markdown(parse_smolagents_output_to_dictionary(agent, response)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ui_tK-3d8D4E"
      },
      "source": [
        "## Evaluate the smolagent with Vertex AI Gen AI Evaluation\n",
        "\n",
        "Building effective AI agents requires careful performance evaluation.  This involves two key practices: monitoring and observability.  Monitoring focuses on task-specific performance: how well an agent executes individual actions. Observability provides a broader view, assessing the agent's overall health and efficiency.  \n",
        "\n",
        "The [Vertex AI Gen AI Evaluation service](https://cloud.google.com/blog/products/ai-machine-learning/introducing-agent-evaluation-in-vertex-ai-gen-ai-evaluation-service?e=48754805) streamlines both monitoring and observability, offering pre-built criteria and metrics applicable from prototyping to production.  This allows you to gain deep insights into agent performance, pinpoint areas for improvement, and optimize your AI solutions.  Explore the documentation for details on available evaluation tools."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A34b-M15-Ow7"
      },
      "source": [
        "### Prepare Agent Evaluation dataset\n",
        "\n",
        "To evaluate your AI agent using the Vertex AI Gen AI Evaluation service, you need a specific dataset depending on what aspects you want to evaluate of your agent as shown below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XwrvCom18FXY"
      },
      "outputs": [],
      "source": [
        "eval_data = {\n",
        "    \"prompt\": [\n",
        "        \"Count the number of 'r' in the word Strawberry. Verify the answer\",\n",
        "        \"How many times does the digit '2' appear in the number 2,222,222? Verify your answer.\",\n",
        "        \"Count the number of words with more than five letters in this sentence: `The quick brown fox jumps over the lazy dog.` Verify your answer.\",\n",
        "    ],\n",
        "    \"reference_trajectory\": [\n",
        "        [\n",
        "            {\n",
        "                \"tool_name\": \"python_interpreter\",\n",
        "                \"tool_input\": {\n",
        "                    \"arg_0\": \"count = 0\",\n",
        "                    \"arg_1\": 'for letter in \"Strawberry\":',\n",
        "                    \"arg_2\": \"if letter == 'r':\",\n",
        "                    \"arg_3\": \"count += 1\",\n",
        "                    \"arg_4\": \"print(f\\\"There are {count} \\\\'r\\\\'s in the word Strawberry.\\\")\",\n",
        "                    \"arg_5\": \"verification = math_verifier(content={\\\\'type\\\\': \\\\'string\\\\', \\\\'description\\\\': f\\\"There are {count} \\\\'r\\\\'s in the word Strawberry.\\\"})\",\n",
        "                    \"arg_6\": \"final_answer(verification)\",\n",
        "                    \"arg_7\": \"final_answer(verification)\",\n",
        "                },\n",
        "            }\n",
        "        ],\n",
        "        [\n",
        "            {\n",
        "                \"tool_name\": \"python_interpreter\",\n",
        "                \"tool_input\": {\n",
        "                    \"arg_0\": \"count = 0\",\n",
        "                    \"arg_1\": \"num_str = str(2222222)\",\n",
        "                    \"arg_2\": \"for digit in num_str:\",\n",
        "                    \"arg_3\": \"if digit == '2':\",\n",
        "                    \"arg_4\": \"count += 1\",\n",
        "                    \"arg_5\": 'print(f\"The digit 2 appears {count} times in the number 2,222,222.\")',\n",
        "                    \"arg_6\": \"verification = math_verifier(content={'type': 'string', 'description': f\\\"The digit 2 appears {count} times in the number 2,222,222.\\\"})\",\n",
        "                    \"arg_7\": \"final_answer(verification)\",\n",
        "                },\n",
        "            },\n",
        "        ],\n",
        "        [\n",
        "            {\n",
        "                \"tool_name\": \"python_interpreter\",\n",
        "                \"tool_input\": {\n",
        "                    \"arg_0\": \"count = 0\",\n",
        "                    \"arg_1\": 'sentence = \"The quick brown fox jumps over the lazy dog.\"',\n",
        "                    \"arg_2\": \"words = sentence.split()\",\n",
        "                    \"arg_3\": \"for word in words:\",\n",
        "                    \"arg_4\": \"if len(word) > 5:\",\n",
        "                    \"arg_5\": \"count += 1\",\n",
        "                    \"arg_6\": 'print(f\"There are {count} words with more than five letters.\")',\n",
        "                    \"arg_7\": \"verification = math_verifier(content={'type': 'string', 'description': f\\\"There are {count} words with more than five letters in the sentence.\\\"})\",\n",
        "                    \"arg_8\": \"final_answer(verification)\",\n",
        "                },\n",
        "            }\n",
        "        ],\n",
        "    ],\n",
        "}\n",
        "\n",
        "eval_sample_dataset = pd.DataFrame(eval_data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PQEI1EcfvFHb"
      },
      "source": [
        "Print some samples from the dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "EjsonqWWvIvE"
      },
      "outputs": [],
      "source": [
        "display_dataframe_rows(eval_sample_dataset, num_rows=3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "htCrOS9fRVi8"
      },
      "source": [
        "### Prepare an Agent function\n",
        "\n",
        "In this scenario with a custom agent, you need an agent function to parse the agent output and pass it to Vertex AI Gen AI Evaluation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "GdO56MIDRZri"
      },
      "outputs": [],
      "source": [
        "def agent_parsed_response(input: str) -> dict:\n",
        "    \"\"\"Parse the agent output and pass it to Vertex AI Gen AI Evaluation.\"\"\"\n",
        "\n",
        "    result = agent.run(input)\n",
        "\n",
        "    # Parse function calls separately\n",
        "    agent_output = parse_smolagents_output_to_dictionary(agent, result)\n",
        "\n",
        "    return agent_output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oEYmU2eJ7q-1"
      },
      "source": [
        "### Run an evaluation task\n",
        "\n",
        "Once you've assembled your evaluation dataset, the next step is to select the appropriate metrics for assessing your agent's performance.  A comprehensive overview of available metrics and their interpretations can be found in the [Evaluate Gen AI agents documentation](https://cloud.google.com/blog/products/ai-machine-learning/introducing-agent-evaluation-in-vertex-ai-gen-ai-evaluation-service?e=48754805).\n",
        "\n",
        "With your dataset and chosen metrics in hand, you're ready to launch your first agent evaluation job on Vertex AI. This is accomplished by initiating an EvalTask with your defined dataset and metrics, followed by executing the evaluate method.  Vertex AI Gen AI evaluation seamlessly integrates with [Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments), the platform's managed experiment tracking service, automatically logging your evaluation run as an experiment.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "wBD-4wpB7q-3"
      },
      "outputs": [],
      "source": [
        "EXPERIMENT_NAME = f\"evaluate-smolagent-deepseek-{get_id()}\"\n",
        "EXPERIMENT_RUN_NAME = f\"response-and-tools-{get_id()}\"\n",
        "\n",
        "response_tool_metrics = [\n",
        "    \"trajectory_exact_match\",\n",
        "    \"trajectory_in_order_match\",\n",
        "    \"coherence\",\n",
        "]\n",
        "\n",
        "response_eval_tool_task = EvalTask(\n",
        "    dataset=eval_data,\n",
        "    metrics=response_tool_metrics,\n",
        "    experiment=EXPERIMENT_NAME,\n",
        ")\n",
        "\n",
        "response_eval_tool_result = response_eval_tool_task.evaluate(\n",
        "    experiment_run_name=EXPERIMENT_RUN_NAME,\n",
        "    runnable=agent_parsed_response,\n",
        ")\n",
        "\n",
        "display_eval_report(response_eval_tool_result)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9eU3LG6r7q-3"
      },
      "source": [
        "### Visualize evaluation results\n",
        "\n",
        "Visualize evaluation result sample."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pQFzmd2I7q-3"
      },
      "outputs": [],
      "source": [
        "display_dataframe_rows(response_eval_tool_result.metrics_table, num_rows=3)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DJr8GqQKTpUa"
      },
      "outputs": [],
      "source": [
        "plot_bar_plot(\n",
        "    response_eval_tool_result,\n",
        "    title=\"Agent eval metrics\",\n",
        "    metrics=[f\"{metric}/mean\" for metric in response_tool_metrics],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4RK-hjsL97hH"
      },
      "source": [
        "## Deploy the agent on Vertex AI Reasoning Engine\n",
        "\n",
        "Your agent prototype is running smoothly in Colab, but it's time to scale it for wider accessibility.\n",
        "\n",
        "[Reasoning Engine on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/reasoning-engine) provides a managed platform for creating and deploying advanced agent reasoning frameworks.  \n",
        "\n",
        "This notebook's approach utilizes a custom application template within Reasoning Engine, which can be further extended with frameworks like smolagents.\n",
        "\n",
        "Let's explore how to deploy our smol-agents agent using Reasoning Engine on Vertex AI.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9c1IqyNOGbfz"
      },
      "source": [
        "### Assemble the agent\n",
        "\n",
        "Define a SmolAgent class designed to interact with a DeepSeek model deployed on Vertex AI.\n",
        "\n",
        "The two main components for building a custom agent are `set_up` and query methods:\n",
        "\n",
        "- The `set_up` method instantiates the agent's core components: a VertexAIServerModel to connect to the deployed DeepSeek model, a DeepSeekMathVerifierTool for mathematical verification tasks, and a CodeAgent to orchestrate the model and tools.\n",
        "\n",
        "- The `query` method provides a simple interface for sending input to the agent and receiving its response, effectively triggering the agent's execution.\n",
        "\n",
        "To know more about custom agent, check out how to [customize an application template ](https://cloud.google.com/vertex-ai/generative-ai/docs/reasoning-engine/customize).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "QaTZPdSB1Nbl"
      },
      "outputs": [],
      "source": [
        "class SmolAgent:\n",
        "\n",
        "    def __init__(\n",
        "        self,\n",
        "        model_id: str,\n",
        "        endpoint_id: str,\n",
        "        tool_endpoint_id: str,\n",
        "        project_id: str,\n",
        "        location: str,\n",
        "        **kwargs,\n",
        "    ):\n",
        "        self.model_id = model_id\n",
        "        self.endpoint_id = endpoint_id\n",
        "        self.tool_endpoint_id = tool_endpoint_id\n",
        "        self.project_id = project_id\n",
        "        self.location = location\n",
        "        self.add_base_tools = False\n",
        "        self.kwargs = kwargs\n",
        "\n",
        "    def set_up(self) -> None:\n",
        "        \"\"\"Set up the agent.\"\"\"\n",
        "\n",
        "        self.model = VertexAIServerModel(\n",
        "            model_id=self.model_id,\n",
        "            endpoint_id=self.endpoint_id,\n",
        "            project_id=self.project_id,\n",
        "            location=self.location,\n",
        "            **self.kwargs,\n",
        "        )\n",
        "        self.tools = [\n",
        "            DeepSeekMathVerifierTool(\n",
        "                project_id=self.project_id,\n",
        "                location=self.location,\n",
        "                endpoint_id=self.tool_endpoint_id,\n",
        "                **self.kwargs,\n",
        "            )\n",
        "        ]\n",
        "        self.app = CodeAgent(\n",
        "            model=self.model,\n",
        "            tools=self.tools,\n",
        "            add_base_tools=self.add_base_tools,\n",
        "            **self.kwargs,\n",
        "        )\n",
        "\n",
        "    def query(self, input: str):\n",
        "        \"\"\"Query the application.\"\"\"\n",
        "        return self.app.run(input)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qSQunNwr95Du"
      },
      "source": [
        "### Test the agent\n",
        "\n",
        "After you get the agent assembled, you can now test it locally to confirm its expected behavior."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "iatBy35eBmBc"
      },
      "outputs": [],
      "source": [
        "local_custom_agent = SmolAgent(\n",
        "    model_id=\"google/gemini-2.0-flash\",\n",
        "    endpoint_id=\"openapi\",\n",
        "    tool_endpoint_id=endpoint_id,\n",
        "    project_id=PROJECT_ID,\n",
        "    location=LOCATION,\n",
        ")\n",
        "local_custom_agent.set_up()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VHoBatTRUA_4"
      },
      "outputs": [],
      "source": [
        "output = local_custom_agent.query(input=\"Hello! How are you?\")\n",
        "print(output)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5zLz5qWqfpjE"
      },
      "outputs": [],
      "source": [
        "output = local_custom_agent.query(\n",
        "    input=\"Count the number of 'r' in the word Strawberry. Verify the answer\"\n",
        ")\n",
        "print(output)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wvHDdFB6UKBS"
      },
      "source": [
        "### Deploy the SmolAgent\n",
        "\n",
        "Your `smol-agent` application is running smoothly locally—excellent!  \n",
        "\n",
        "Let's now deploy it to Reasoning Engine on Vertex AI. This deployment will make your application accessible remotely, opening up possibilities for integration with broader systems and use as a standalone service."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "CJclYoSDUPo3"
      },
      "outputs": [],
      "source": [
        "local_custom_agent = SmolAgent(\n",
        "    model_id=\"google/gemini-2.0-flash\",\n",
        "    endpoint_id=\"openapi\",\n",
        "    tool_endpoint_id=endpoint_id,\n",
        "    project_id=PROJECT_ID,\n",
        "    location=LOCATION,\n",
        ")\n",
        "\n",
        "remote_custom_agent = reasoning_engines.ReasoningEngine.create(\n",
        "    local_custom_agent,\n",
        "    requirements=[\n",
        "        \"google-cloud-aiplatform[reasoningengine]\",\n",
        "        \"openai\",\n",
        "        \"smolagents\",\n",
        "        \"cloudpickle==3.0.0\",\n",
        "        \"pydantic>=2.10\",\n",
        "        \"requests\",\n",
        "    ],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2NL5UL6jVT66"
      },
      "source": [
        "### Call the agent\n",
        "\n",
        "Now that the agent is deployed, let's call the agent to answer our math questions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "_HbvguziVVtw"
      },
      "outputs": [],
      "source": [
        "output = remote_custom_agent.query(\n",
        "    input=\"Count the number of 'r' in the word Strawberry. Verify the answer\"\n",
        ")\n",
        "print(\"Agent response:\", output)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2a4e033321ad"
      },
      "source": [
        "## Cleaning up"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mYDB5a_MDWiO"
      },
      "outputs": [],
      "source": [
        "delete_bucket = False\n",
        "delete_endpoint = False\n",
        "delete_model = False\n",
        "delete_remote_agent = False\n",
        "\n",
        "if delete_bucket:\n",
        "    ! gsutil rm -r $BUCKET_URI\n",
        "if delete_endpoint:\n",
        "    deepseek_endpoint.delete(force=True)\n",
        "if delete_model:\n",
        "    deepseek_model.delete()\n",
        "if delete_remote_agent:\n",
        "    remote_custom_agent.delete()"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "vertex_ai_deepseek_smolagents.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
